Working with the GCDF Dataset:

A New User’s Perspective

Teal Emery

Agenda

  1. Working with the GCDF data as a new user: joys, pain points, and ideas for marginal improvements.
  2. chinadevfin2 - an R package for working with the GCDF 2.0 dataset.
  3. Group discussion.

1. Working with the GCDF data as a new user

Context

How I became a connoisseur of large international datasets…

  1. Private sector
  2. Policy research
  3. Teaching

Overall Assessment

The GCDF 2.0 is a well organized dataset with A LOT of amazing data. The methodology and variables are well documented. The pain points I’ve experienced working with it are common to most large datasets:

  1. Getting the data ready for analysis is time consuming.
  2. Country names are not presented in a standardized format.

Three Easy Improvements

You work hard to create these datasets. What are some easy-to-implement changes to enhance their use?

  1. Release the dataset as a .csv file in addition to the .xlsx to make it easier to import into data analysis software.
  2. Add country ISO3C codes for recipient countries to make it easier to combine the GCFD data with complementary datasets.
  3. Create a “cookbook” demonstrating simple examples of how users can use the data to gain insights, and answers to common challenges. This can ease adoption of the dataset.

2. Introducing the chinadevfin2 R Package

chinadevfin2

Caveats

  1. So far, this is a personal project.
  2. It is a minimum viable product.
  3. It won’t take too much effort to adapt this to the new GCDF 3.0 dataset.

Let’s Explore

  1. chinadevfin2 R package website
  2. Getting Started tutorial
  3. A simple interactive web app built using chinadevfin2

3. Group Discussion